Soccer captioning: dataset, transformer-based model, and triple-level evaluation

نویسندگان

چکیده

This work aims at generating captions for soccer videos using deep learning. The paper introduces a novel dataset, model, and triple-level evaluation. dataset consists of 22k caption-clip pairs three visual features (images, optical flow, inpainting) 500 hours SoccerNet videos. model is divided into parts: transformer learns language, ConvNets learn vision, fusion linguistic generates captions. suggested evaluation criterion captioning models covers levels: syntax (the commonly used metrics such as BLEU-score CIDEr), semantics quality descriptions domain expert), corpus diversity generated captions). shows that the has improved (from 0.07 reaching 0.18) with semantics-related losses prioritize selected words. Semantics-related utilization more (optical normalized score by 27%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

teacher educator evaluation model

اگرکیفیت معلم کلاس برای بهبودیادگیری دانش آموزحیاتی است،پس کیفیت اساتیددانشجو-معلمان، یابه عبارتی معلمین معلمان نیزبرای پیشرفت آموزش بسیارمهم واساسی است.ناگفته پیداست که یک سیستم مناسب آموزش معلمان ،معلمین با کیفیتی را تربیت خواهدکرد.که این کار منجربه داشتن مدارس خوب، ودرنتیجه نیروی کارماهرتروشهروندبهتربرای جامعه خواهدشد. اساتیددانشجو-معلمان نقشی بسیارمهم را در سیستم اموزش معلمان درسراسرجهان ای...

End-to-End Dense Video Captioning with Masked Transformer

Dense video captioning aims to generate text descriptions for all events in an untrimmed video. This involves both detecting and describing events. Therefore, all previous methods on dense video captioning tackle this problem by building two models, i.e. an event proposal and a captioning model, for these two sub-problems. The models are either trained separately or in alternation. This prevent...

متن کامل

Image Captioning with Sentiment Terms via Weakly-Supervised Sentiment Dataset

Image captioning task has become a highly competitive research area with application of convolutional and recurrent neural networks, especially with the advent of long short-term memory (LSTM) architecture. However, its primary focus has been a factual description of the images, mostly objects and their actions. While such focus has demonstrated competence, describing the images with non-factua...

متن کامل

Evaluation of Model based Tracking with TrakMark Dataset

We benchmark two tracking methods developed in the INRIA Lagadic team with a TrakMark dataset. Since these methods are based on a 3D model based approach, we selected a dataset named “Conference Venue Package 01” that includes a 3D textured model of a scene. For the evaluation. we compute the error of 3D rotation and translation with the ground truth transformation matrix. Through these evaluat...

متن کامل

MAM-RNN: Multi-level Attention Model Based RNN for Video Captioning

Visual information is quite important for the task of video captioning. However, in the video, there are a lot of uncorrelated content, which may cause interference to generate a correct caption. Based on this point, we attempt to exploit the visual features which are most correlated to the caption. In this paper, a Multi-level Attention Model based Recurrent Neural Network (MAM-RNN) is propose...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Procedia Computer Science

سال: 2022

ISSN: ['1877-0509']

DOI: https://doi.org/10.1016/j.procs.2022.10.125